CfPA 95-th-35/Submitted to Ap.J 



Unbiased Cluster Lens Reconstruction 

Gordon Squires 1,2 and Nick Kaiser 3 

ABSTRACT 

Weak lensing observations measure the shear field j a , and hence the gradient 
of the dimensionless surface density k. We present several new algorithms to 
recover k from shear estimates on a finite region and compare how they perform 
with realistically noisy data. The reconstruction methods studied here can 
be divided into two classes: direct reconstruction and regularized inversion 
techniques. The direct reconstruction techniques express the surface density as 
a two-dimensional integral of the shear field: ft(r ) = / d 2 rK a (f; fo)j a (r). This 
allows one to construct an estimator for discrete sum over background 

galaxy ellipticities which is straightforward to implement, and allows a rigorous 
yet simple estimate of the noise arising from random intrinsic background 
galaxy ellipticities. We study three types of direct reconstruction methods: 1) 
K-estimators that measure the surface density at any given target point relative 
to the mean value in some reference region 2) a method that explicitly attempts 
to minimize the rotational part of Vk that is due to noise and 3) a novel, 
exact Fourier-space inverse gradient operator. We also develop two 'regularized 
maximum likelihood' methods, one of which employs the conventional discrete 
Laplacian operator as a regularizer and the other uses regularization of all 
components in Fourier space. We compare the performance of all the estimators 
by means of simulations and noise power analysis. A general feature of these 
unbiased methods is an enhancement of the low-frequency noise power which, 
for some of the methods, can be quite severe. We find the best performance is 
provided by the maximum likelihood method with Fourier space regularization, 
although some of the other methods perform almost as well. 
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1. Introduction 



Clusters of galaxies, acting as gravitational lenses, introduce a statistical anisotropy 
in the shapes of faint background galaxies. In the weak distortion regime it is possible to 
construct (see e.g. [Kaiser, Squires fc Broadhurst 19951) a 'polarization' statistic e a — a 
certain measure of the ellipticity of the background galaxy — whose expectation value is 
proportional to, and therefore provides a direct measurement of, the gravitational shear 
i e a) = la = {(0,ii — 0,22)/2, 4>,ij}, where <fi is the surface potential for the lens. The 
precision with which one can determine 7 a is of course limited by the number of background 
galaxies, and care must be taken to allow for systematic bias from seeing and removal of 
artificial anisotropy, but several groups have now shown that the shear can be detected 
to a reasonably high level of significance. The list of shear fields mapped around clusters 
includes A1689 QTyson et al. 199Q , [lyson fc Fischer 1995| ), A2218 ( [Squires et al. 1995| ), 
C11409+52 ([lyson et al. 1990|) , MS1224+20 flKahlman et al. 1994ft , C10024+17 flBonnet e\ 
al. 1994! , [Mellier et al. 1994! ) , C11455+22 and C10016+16 (|Smail et gTT99g , [Smail et a, 



19951) . 



The lens surface potential is related to the dimensionless surface density k = S/E crit 
by V 2 = 2k. In an earlier paper (Kaiser & Squires 1993; hereafter KS93) we proposed 
a method for reconstructing the surface density from the shear. Writing the shear as 
la = D a K, where the 2-component integro-differential operator is V a = {df — d\, 2<9i<9 2 }V~ 2 
one can readily show that k = T> a / -f a . The Green's function for this operator is — Xai^)/ 7 ^ 
with Xq(^*) = { r i — r 2i 2r!r 2 }/r 4 , so we obtain 



1 



7T 



«(^o) = / d rx a (r- r )^ a (r) 



(1) 



and this 2-dimensional convolution integral can in turn be replaced by a discrete sum over 
the background galaxies to give the estimator 



k(?o) = ~— J2 Xa(r g - f )e a 

n1X 3 

where n is the mean surface number density of the background galaxies. 



(2) 



Unfortunately the kernel Xa is infinite in extent, so this requires, strictly speaking, 
data extending to infinity. With finite data the estimate as written above becomes biased, 
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as discussed in KS93 and illustrated in the simulations presented there. Near the center of 
the field on which we have data, this simply results in a nearly constant suppression in the 
reconstructed surface density, and since the baseline surface density is ambiguous in any case 
this is relatively benign. However, near the edge of the data the shape becomes biased and 
one typically finds the density goes negative in a trough around the cluster and then rises 
again as one approaches the edge. The algorithm effectively tries to generate a lens that has 
a shear pattern like that observed within the region surveyed, but which falls to zero outside 
the data boundary. One immediate consequence of this is that there must be zero net 
mass in the reconstruction. Because of this we have tended to augment our 2-dimensional 
reconstructions by aperture mass measurements ( |Fahlman et al. 1994| , |Squires et al. 19951 ) 
which do not suffer from this bias. 

This raises the interesting question: since the shear is a non-local function of the surface 
density — so the shear within some region is determined in part by the surface density 
outside — is it possible to uniquely determine k from local measurements? The answer 
is affirmative, subject only to the qualification that one can add an arbitrary constant to 
K. One way to see this is from the expression for the angular gradients of k in terms of 
gradients of the shear (Kaiser 1995): 



Vk = 


«,1 




7l,l + 72,2 




. K ,2 . 




. 72,1 - 71,2 . 
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(3) 



This relation follows directly from the expressions for k and j a in terms of 4> above, 
and 'projects out' a certain linear combination of the shear gradients which are locally 
determined. Thus with perfect data we could determine k at any point relative to the value 
at some arbitrary reference point simply by performing a line integral of the observable 
Vk. This also provides an alternative way to visualize the convolution integral equation 
(HI); if one averages over radial line integrals from points on some very distant boundary (on 
which we assume that 7 and k vanish) to a 'target' point r*o, we obtain a two dimensional 
integral, and integrating by parts to express everything in terms of the shear rather than its 
gradients we obtain equation ([[]). Equation (|3|) is valid in the weak distortion regime — the 
main focus of this paper — but can be readily extended into the strong-distortion regime 
(Kaiser 1995). 

The above argument gives a qualitative insight on how to account for the bias in the 
original algorithm and several groups have exploited the above relations to develop unbiased 
surface density estimators flSchneider 1994| ; |Kaiser et al. 1995] ; [Schneider fc Seitz, C, 1995 



Seitz, C, fe Schneider 1995] ; |Seitz, S., fc Schneider 1995]) . In this paper, we address the 
reconstruction problem from a variety of perspectives. First we expand on some simple 
alternative solutions to the finite-field problem where we determine the surface density 
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relative to some well-defined reference region as sketched in Figures |T|a,b,c. In §|2| we show 
that in general, if one uses a line integral averaging scheme then one can formulate an 
expression for k relative to the appropriate reference value as an integral like equation ([I]), 
but with Xa{r '— rb) K a (f; fb). The benefits of this approach are twofold: By performing 
the line integrations and averaging analytically we are able to construct practical estimators 
as simple discrete sums over the background galaxy shear estimates, so with this method 
it is relatively straightforward to obtain rigorous estimates of the statistical uncertainty in 
the reconstruction arising from random intrinsic ellipticities. Second, the analysis reveals 
how problematic boundary terms (^-function terms in the kernel K a ) inevitably arise if 
one restricts the reference region geometry to a 1-dimensional line, and how they can be 
avoided. 



Another inversion algorithm has been proposed ( fSeitz, S., fc Schneider 1995| ) that is 



not only unbiased, but also attempts, in some sense, to minimize the contribution due to 
noise. Since the gradient of the shear is related to Vk, in the absence of noise, the curl of 
Vac should be zero. In practice, noise adds a rotational part to the surface density gradient 
and the Seitz & Schneider algorithm attempts to minimize this. We show however that 
this technique also gives rise to boundary terms which tend to inflate the noise in the 
reconstructions and give quite similar results to other methods that give large weight to 
galaxies near the boundary. 

In §|3] we develop a new discrete Fourier transform based inverse gradient operator 
which can be implemented with an fast Fourier transform (FFT). Because of periodic 
boundary conditions, applying the familiar algebraic inverse gradient operator 1/k (or its 
generalization for finitely spaced data) does not recover the true surface density from V/c. 
In fact, for zero padded data, one recovers precisely the bias inherent in the original KS93 
method. We show how to remove this bias and derive a Fourier-space inverse gradient 
operator which exactly recovers a scalar field from its gradient. Unfortunately, this method 
does not seem to be optimal in terms of its low frequency noise, but the method has the 
advantages that it is simple, fast, and can readily be extended to apply in the non-linear 
regime where the data provide one with a map of Vlog(l — k) (where the low- frequency 
noise is not an important issue). 

Fourthly, we consider regularized inversion methods for constructing the surface 
density. In general terms, this approach arises from fitting a series of general models for the 
underlying surface density and attempting to determine which the most probable model for 
the true surface density, given a particular set of observations. This style of approach leads 
to two difficulties. First, if we fit a model with only a few free parameters, then we severely 
restrict the reconstructions to some basic forms - in effect the reconstructions will reflect our 
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prejudice for what we expect rather than allowing the data to determine the surface density. 
However if we allow the model to be very general, then we are essentially introducing a 
large number of free parameters and the inversion become ill-conditioned. We pursue this 
scheme of many-parameter inversion in §fO and §fO and explore two methods to regularize 



the inversion based on the maximum likelihood and maximum entropy techniques. 

The foregoing analyzes yield several estimators all of which are unbiased. We address 
the question of what estimator is most desirable in practice in §[|. They are really precisely 
equivalent for perfect data, and the only rationale for choosing one method is how well it 
performs with real, noisy data. For any particular quantity one chooses to measure from 
a reconstruction the question is well defined and one can objectively determine the best 
solution. The problem is deciding what are the quantities of interest, which is a somewhat 
subjective issue. Here we focus on the low frequency noise in the reconstructions. This 
seems reasonable since the whole purpose of these modifications to equation (P is to cure 
the bias which is essentially a low frequency phenomenon. To explore this we first make 
simulated reconstructions using a variety of geometries. We then make a more objective 
and quantitative comparison by seeing how well these methods perform for particular 
low-frequency measurements. 

In ^ we derive some useful results for estimating the mass within an aperture. While 
this can be done by performing 'aperture densitometry' on the reconstruction, we show 
that with a simple modification of formula for one of the direct reconstruction methods, 
one can measure the aperture mass as a single summation over the shear estimates. This 
greatly simplifies the estimation of the statistical uncertainty. How to do this for a circular 
aperture has been described elsewhere ( [Kaiser et al. 1995| ) and here we generalize this to 
obtain a useful bound on the mass within apertures of arbitrary shape. 

Finally, we consider what appears at first sight to be a quite different approach to this 
problem. Gauss' law in 2-D equates the mass inside a loop and the integral of the normal 
gravity around the loop. Since the shear is the spatial derivative of the gravity one can 
thereby derive an expression for dK/d In A as a certain integral of the shear around the 
boundary. This provides a generalization of equation (^) to non-circular loops. Our hope 
was that this would give a different estimator — the construction we use in §|2] being highly 
and arbitrarily restricted to averages over straight line integrals — but it turns out that the 
result is exactly equivalent to our case-b estimator (see below) and so we consign this to an 
appendix. 



2. Finite Field Kernels 



From equation (|3|) we see that it is technically possible to determine k at any point 
relative to the value at some arbitrary reference point simply by performing a line integral 
of the observable Vk. Clearly then, one could recover k by simply averaging / dl ■ Vk over 
radial lines from the boundary of the observed region, as suggested by Kaiser et al. (1995) 
and illustrated schematically in Figure |l|a, and an algorithm to implement such a scheme 
was developed by Schneider (1994). 

Schneider's algorithm evaluates the line integrals numerically. To do this the shear 
estimates are first binned into grid of cells and then smoothed. The gradients of the shear 
(and hence gradients of k) are approximated by discrete differences on the smoothed shear 
field and Vk is then numerically integrated along a family of lines extending from each point 
on the plane to the boundary. Now the geometry for the line integrals is actually slightly 
different from that shown in Figure |l]a, but shares the key property that the boundary 
points are the same for each target point so, for perfect data at least, this procedure should 
generate an exact and unbiased reconstruction. 

The bias in the low spatial frequency behavior in the KS93 estimator is a real problem 
- especially if one is interested in the mass profile in the outer parts of cluster — and 
Schneider's algorithm is an interesting attempt to avoid this, but it has one somewhat 
disturbing feature. The line integrals of gradients of the shear give rise to integrals of 
the shear plus boundary terms. In the averaging over lines, the former become two 
dimensional integrals which are very similar in form to the convolution integral in equation 
(0), and are relatively straightforwardly implemented as discrete sums over galaxies. The 
latter, however, become line integrals of the shear around the perimeter. As we will see, 
these make an important contribution, especially to the low frequency components of the 
reconstruction, but they are very difficult to estimate reliably. The problem is that if the 
shear is smoothed then its value at the boundary will tend to be biased. On the other 
hand, if the smoothing radius is taken to be very small then the boundary shear must be 
estimated by averaging over a small number of galaxies and this will inflate the statistical 
uncertainty in the reconstruction. 

As discussed elsewhere ( [Kaiser et at. 19"95| ) there is a simple yet general relation 
between the mean tangential shear around a circular loop and the mean surface density 
within that loop. Defining the tangential shear as jt = — (7icos2yj + 72sin2<£>) — it 
measures the stretching of the galaxies along the loop — and the mean interior surface 
density as k = M/A, one finds 

d~K 



Fig. 1. — Schematic illustration of line integration paths used to construct the surface density 
estimators, a) Paths for the estimator of k relative to (k) the mean on the boundary, b) 
Paths averaged over if we measure k wrt K, the mean over the whole area surveyed, c) 
Hybrid scheme taking reference value k on a strip around the boundary of finite width. 
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This is easily derived from Gauss' law (see appendix [A]), and can been used to estimate the 
mass within a circular aperture ( |Fahlman et al. 1994| ). If we simply evaluate / da\A{pfj) 



out to some radius r we obtain the surface density at the origin minus 7c(r), the mean over 
the disk. Now by definition K = (2/r 2 ) / drr(K), where (k) denotes the mean of k on the 
loop, from which it follows that k — (k) — dn/dlnA. Comparing with equation we can 
see that if we were to change our reference surface density from k to (k) the only effect is 
to introduce the pure boundary term (tt)- 

Our analysis shows that it is technically impossible to measure the surface density 
relative to the mean on a boundary of infinitesimal width — which is perhaps hardly 
surprising — but allows the possibility that one measure k relative to a well defined 
reference strip of thin but finite width, which would give an approximation to Schneider's 
method but without the bias, or any similar alternative such as Figure |I|b. 

In this section, we construct expressions for k as averages of line integrals of Vk from 
a set of points uniformly distributed over some reference region for the various geometries 
illustrated schematically in Figure [I] (although the square geometry is only illustrative and 
the analysis below applies general survey geometries). Then, using equation (||), we obtain 
explicit expressions for k as some integral over the shear field which can then readily be 



converted in §2.5 to form practical estimators of k given a set of shear estimates. 



2.1. case a: 

We first construct an estimator for k — (k) where (k) is the mean surface density on 
the boundary of the data region. Place the origin of coordinates at the 'target point' r 
where we wish to measure k. Let the boundary of the region on which we have data be 
p((p), parameterised by azimuthal angular coordinate ip. We will assume that the entire 
boundary is visible from any interior point where we will attempt to estimate k. The mean 
of k on the boundary is (k) = J dipW(ip)K where W = \t\/L, with L = J dip\i\ the length of 
the perimeter, with tangent vector t = dp/ dip. Taking the average over radial paths to the 
boundary as in Figure Qa we find 

pfoO 

K— (k) — — J d(pW((p) J drr '• V 'n/r = - J d 2 rWf -V 'n/r 2 (5) 

o 

Substituting equation @ for Vk and integrating by parts to express everything in terms of 
7 rather than its derivatives we obtain 

«-<«) = / d 2 rK a ^ a - — j dlYafla (6) 
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where 



K a = 2Q a {W/r 2 ) 



(7) 



where we have defined the differential operator 





2 y£ + *l 



and where the loop integral is taken around the perimeter with 



a 



(Pxty+Pyt x )/p 2 

{Pyty -Pxtx)/P 2 



(9) 



With the origin placed at the center of a circular survey region we recover the KS93 kernel 
and, as expected, the surface term is just the mean tangential shear on the boundary. 



survey geometries. 

This is clearly very similar to Schneider's method. Let us clarify this. His method 
consists of two sequential operations on the data. The first is the smoothing of the shear 
field to produce a fine grid of 7 Q values. The second consists of differencing these binned 
values, grouping the various components together to make Vk and then integrating. Let us 
ignore the smoothing for now. Now the second step is just some linear operation on the 
input grid of shear values, and could be written out as a linear sum: k = J^Cala- What 
we have done above is to explicitly calculate the coefficients c a in this sum. Now in fact 
Schneider measures k relative to a slightly different average on the boundary, and uses a 
slightly different geometry for the rays along which the line integrals are performed, but 
these are irrelevant details. The worrying thing which emerges from our analysis is the 
presence of the boundary term, which means that the grid points on the boundary receive 
a very high weight in this sum. Now as we have seen, the contribution of this term to k is 
on the order of the mean shear on the boundary, which is roughly equal to k, so clearly 
the contribution to the low spatial frequency components in the reconstruction from this 
surface term is substantial, and since the goal here was to fix the bias in the low-frequency 
behavior of KS93, it is important that the method get this right. It is hard to see how this 
can be, since if the shear field is smoothed then the value on the boundary will be biased. 
One the other hand with a small smoothing length the boundary term must be calculated 
by averaging over a small number of galaxies and this will inflate the statistical uncertainty. 



Equations |6], [7|, [5] provide the generalization for points off the axis and/or non-circular 



Fig. 2. — The heavy loop represents the boundary of the region on which we have data. The 
target point lies at the origin of coordinates; p is the point on the boundary lying behind 
the measurement point f. The dotted lines are parallel to the normal to p and the tangent 
vector at p respectively, and the ellipse shows the orientation of the unit polar 7* whose 
major axis is rotated anticlockwise from the vertical by an angle (p + 9/2. Also shown is d, 
the perpendicular distance from the origin to the tangent vector. 
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2.2. case b: 

Let us now construct instead an estimator of k — K; where the zero-point is now set to 
be k = J cPrn/A, with A = J d 2 r. Taking the average over radial lines to the origin from 
each point within the survey region, as illustrated in Figure |]b, we have 

^ P(<P) r p(<p) 

K — k = — - J dtp J rdr J dr'r 1 ■ V/«(r')/r' = -^-^ J dip J rdr(p 2 /r 2 — l)r ■ V/t (10) 



This is very similar in form to equation [5], but now the function W is replaced by the 
function (p 2 — r 2 )/2A which vanishes on the boundary, so substituting from equation (^) 
and integrating by parts we obtain 

K-K = —fd 2 rK a j a (11) 

where now 

K a = g a (p 2 /r 2 ) (12) 



but no boundary term. 

A geometric picture of this kernel may be obtained as follows: In polar coordinates 
(r,ip) we find Q a = {l/2)R a p{2(p){rd/dr ) d/dip} where R a p ={{cos,— sin}, {sin, cos}} is the 
2-dimensional rotation matrix and, since p = p(<p), we have K a = — (p/r 2 )R a/ 3(2(p){p, p'} 
with p' = d\p\/dip. Now from inspection of Figure one can see that p' = ptanO, and 
the perpendicular distance from the origin to the tangent is d — pcos8, so we find 
K a = p 3/ -f^/r 2 d where the 'unit polar' is defined to be 



7 

and hence 



cos 2(^ + 0/2) 
sm2((p + 6/2) 



(13) 



k-k = - I d'rSr^ia (14) 



- fd 2 r p- 

AJ r 2 d 

This is very similar to equation ([]]). Both equations 'project out' a particular component 
of the shear. In equation ([]]) this is simply the tangential shear; i.e. the stretching 
perpendicular to the line from the origin to the measurement point. In equation ( |H| ) the 
component projected out measures the stretching along the direction which bisects the 
normal to the vector p and the boundary tangent direction. The orientation of 7* is shown 
in Figure @. 
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2.3. case c: 

We can now readily generalize this analysis to the case where the reference region does 
not fill the entire data region or has some complicated geometry so that lines from the 
target point may pass in and out of the reference region, perhaps many times. If we think of 
the ends of the lines in Figure |l|b,c as lying on a unit spaced regular 2-dimensional grid (and 
let the unit length be very small) then A(n — k) is just the sum over line integrals. Clearly, 
if we excise part of the reference region say, we must simply remove the contribution from 
all the paths which originate in the excised region. In the general situation a line from the 
target point (with some given azimuthal angle <p) will pass through the boundary of the 
reference region (which need not be simply connected) at points p\ . . .p n . Let the points be 
sorted in order of increasing distance so the most distant point is p n . The mathematical 
expression of the subtraction described above is the generalized kernel 

K a = J2S i e(r-p l )g a (p^/r 2 ) (15) 

i 

where S n = 1, and S n -i = — S n and where Q(x) is the Heaviside function. The surface 
density is then given by equation QTTJ) as before, but with A now the area of the reference 
region. 

This is illustrated in Figure [| for the case of a simply connected reference region which 
only partially fills the region on which we have data. If the target point r*o lies inside the 
reference region (upper plot) then the line through r crosses the boundary just once and we 
have 

K a = \ G ^ 2/r2) for{ r<P (16) 
^ o y p < r 

If the target point lies outside the reference region then either the line from tq though f 
misses the reference region completely (in which case K a = 0) or it crosses it twice at pi 
and P2 as illustrated in the lower panel, and we then have. 

K a = { G Q (p 2 2 /r 2 ) for { Pl <r< p 2 (17) 

We can now easily construct the kernel for case-c, where we have two nested boundaries; 
we simply subtract the kernel calculated as above for the inner boundary from that for 
the outer. This also allows the possibility that the outer boundary might lie inside the 
actual data boundary. Case-c approaches case-a in the limit that the reference strip is 
infinitesimally thin. If we use a narrow but finite width strip we should obtain something 
very similar to but without the bias inherent in Schneider's method. 





Fig. 3. — Generalization of case b when the reference region only partially fills the region on 
which we have data. The form of the kernel is shown for the various possible cases when the 
point in question lies inside the reference region (upper panel) or outside it (lower panel). 
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These formula were derived for the special case where the spatial origin coincides with 
target point. For a general choice of origin we have 

re(r Q ) - 7c = — J d 2 rK a (f; f )^ a (f) (18) 

where, for case-b for example, K a (f;f ) = G a (p 2 (r, r )/(r — r ) 2 ), where we have explicitly 
shown the dependence of the boundary point on the target and measurement points. 



2.4. Example Kernels 

An illustrative and interesting case is that of the case-b kernel for a rectangular survey 
geometry. Erecting lines from the target point (the point at which we hope to measure k) 
to the corners of the rectangle divides it into N,S,E and W quadrants. From equation ([14]) 
we find, in the N and S sectors respectively, K\ = (Y ^= Uo) 2 / {y — Ho) 2 where Y is half height 
of the rectangle, and K 2 = —(x — x )K 1 /(y — y ). Similarly, for galaxies in the E and W 
sectors we have K\ = — (X =p x) 2 / (x — x ) 2 and K 2 = (y — yo)K 1 /(x — x ) where X is the 
half-width. This kernel, is shown in the Figure |] for various target positions. The pattern 
is qualitatively similar to the KS93 kernel, which is also shown for comparison, but now 
depends in a non trivial way on f . Note also the discontinuity along the divisions between 
the N,S,E,W sectors. As we will see, these give rise to spurious, but essentially harmless, 
linear features in the reconstruction. 

Also shown in Figure [| are the kernels for the case of circular reference region lying 
within the survey region (assumed square for simplicity). If we define /i = r ■ tq/ttq, and 



Z = \jR 2 /r 2 + /i 2 — 1 where R is the radius of the survey and let Q = Z — fi then for 
case-b, setting r' = r — r and dropping primes we have 



Zr 4 



x 2 — y 2 
2xy 



+ r Q 2 



Zr 3 



xx - yy 



(19) 



These formulae are only valid if both target and observation point lie within the survey 
disk, and must be modified if the target or observation point lie outside as discussed above. 

The case-b and case-c circular kernels are in each case very similar, and the main 
difference is just the extra term around the annulus. As one makes the width of the 
reference region smaller the kernel in the annulus increases in inverse proportion and so, in 
the limit of a very thin annulus we end up giving infinite weight to the infinitesimal number 
of galaxies lying in the annulus. 
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Fig. 4. — Plots showing the kernels for various reference region geometries and for various 
target positions. The leftmost pair of columns give the KS93 kernels Xi an d X2- The next 
pair show the case-b kernel for a circular survey geometry and the middle pair show the 
case-c kernel. The two pairs of columns on the right show the case-b and case-c kernels for 
rectangular geometry. The case-c kernels are very much like their case-b counterparts, save 
for the enhanced value within the reference annulus itself. As the reference strip becomes 
thinner this becomes stronger. 
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2.5. Practical Estimators 

Given a catalogue of background galaxy ellipticities e a which, suitably defined 
(e.g. |Kaiser, Squires fc Broadhurst 1995| ), give a set of estimates of 7 a at their locations 
{f g }, we can form the estimator 

^ ~ ~A Eg^g (20) 

whose expectation value is just «(r ) minus the appropriate reference value, and where {w g } 
are a set of weights that we can tune to optimize performance. Alternatively, one could 
bin the galaxies on some grid, calculate the mean shear estimate for each bin, and then 
evaluate / d 2 r K a ^ a as a discrete sum with appropriate weights. These two approaches are 
equivalent in the limit of small bin size. For uniformly sampled points (say on a fine grid) 
the weights should clearly all be equal. Given some random realization of background galaxy 
positions the constant weight estimator is no longer ideal: the kernel we are effectively 
applying is nK a /n which differs from the true kernel because of the fluctuating density of 
the background galaxies. A uniform weighting scheme therefore introduces a multiplicative 
element of noise, which can be removed by weighting each galaxy in inverse proportion to 
the local density of neighbors. However, the additive error arising from the random intrinsic 
ellipticities will consequently increase. The optimum weight therefore depends on signal to 
noise, with constant weight preferred in low signal to noise situations. For typical clusters 
the distinction between these is minor as to get into a regime of high S/N one really has to 
look at large-scale features and then the Poissonian fluctuations in the background galaxy 
number density are small. Henceforth we will consider the uniform weight estimator 

K(r ) = J2 K ^(r g ; f )e a (21) 

What about allowance for contamination by cluster galaxies? If one assumes that 
they simply add a randomly oriented component to the background galaxy population 
then one should simply use equation ( 21]) as it stands but with n determined away from 



the cluster. If, however, the light from the cluster galaxies masks a significant fraction of 
the background galaxies one should boost the weight given to the surviving galaxies by an 
appropriate amount. 

Note that all the the kernels are strongly divergent at r — > 0, naively suggesting a 
logarithmic divergence in the kernel integral if we simply count powers of r. However, 
this divergence is not real. Consider the case b kernel given in equation flT2|). If we make 
a Taylor expansion of 7 a about the origin then it is only the zeroth order term which 
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threatens to produce a divergence, gradients or higher order terms being clearly convergent. 
For 7 a = constant, we can integrate equation ( [TT]) by parts. If we restrict the integration to 
some patch around the origin we find 

/ d 2 rK ala = J dy{xry 1 + y l2 ) \p 2 /r 2 \l\ , * 

+idx(x l2 - yil )[pyrX [ZZ) 

This is a boundary term which vanishes identically if we take the boundary to be p 2 /r 2 = 
constant. Thus, if we define the integral in equation (|TI|) as the limit as e — > of J r>ep d 2 r . . . 
it is then clearly non-divergent. 

A closely related issue is the question of noise in the surface density estimator given 
in equation ([21]) arising from the divergence of the kernel. The reconstruction is the sum 
of random 'shots' convolved with the kernel, now considered a function of r$. Clearly, if 
we make an unsmoothed reconstruction we will find divergent 'butterfly' patterns at the 
location of each galaxy. In the case of the KS93 kernel it is easy to see that this noise is 
a purely high frequency phenomenon and goes away on smoothing the final density field. 
The reason for this is that the kernels xu X2 vary as cos2y9, sin2<^ so the coupling to long 
wavelength Fourier modes is highly suppressed. Here essentially the same is true, but we 
have to be a little careful. In the KS93 estimator one can simply soften the divergence 
by replacing 1/r 2 — > l/(r 2 + r 2 ) for some tiny softening length which merely suppresses 
overshoot in the computer. Were we to do that here, there would be a divergent coupling 
to low frequency modes because of the asymmetry of the kernel pattern which, in reality, 
would be cut-off by the finite pixel size in the reconstruction. The answer is to soften the 
kernels by multiplying by some function W(r/ep) where e is a small parameter and where 
W(x) —>■ for 1 < 1 and W(x) — > 1 for x ^> 1 (the function used in the simulations shown 
below was 1 — exp(— x 3 )). In principle, the long wavelength behavior of the reconstruction 
should be independent of the parameter e provided it is 'sufficiently small'. A slight 
technical problem with this is that the scale length of the softening becomes small for those 
galaxies near the boundary and so quite a small pixel size is required in order to resolve 
this properly. 



2.6. Minimizing Noise Contributions 

Another way of approaching the problem is to notice that from equation (|), V x Vk 
= in the absence of noise. Hence for real, noisy data, a sensible suggestion is that 
the rotational component of the gradients of the shear field should, in some sense, be 
minimized. Seitz and Schneider (1995) show that by adopting two reasonable assumptions: 
1) the rotational component should vanish if the shear is a gradient field and 2) there is no 
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systematic rotation over the data field, then there is a unique specification for decomposing 
the gradient of k into a gradient and rotational part. The details are involved but the main 
result is that this gives an expression for k in the familiar format as a convolution of the 
gradients of the shear with a kernel: 

K{f Q ) - R = J d 2 rK a (f; n>)u a (r) (23) 

where u a (r) is defined in equation @. 

The problem arises when we attempt to express this with respect to the shear, instead 
of its gradients. In solving for the kernel, Seitz and Schneider are effectively solving a 
Neumann problem, where the kernel is a gradient vector field with K a (f; fo)n a {r) = on 
the boundary, and where n(r) is the unit normal to the boundary. Integrating equation 
(^3|) by parts, we obtain an estimate for the surface density as a convolution of a kernel 
with the shear estimates plus a boundary term, much like our case-a estimator. 

One might expect then, that the limitation of this method is very similar to the case-a 
estimator (or thin reference region case-c): to estimate the gradients of the shear field, it 
is necessary to introduce some smoothing. This will bias the estimate near the boundary, 
or inflate the noise if the smoothing is reduced. The latter is particularly problematic. The 
presence of a boundary term tends to give very large weight to a few galaxies near the 
boundary and this introduces a worrisome low frequency fluctuation in the reconstructions 
- yet it was just this feature we are attempting to reduce in considering alternatives to the 



original KS93 estimator. We return to this problem more quantitatively in §5.2. 



3. Direct Fourier Reconstruction 

We can construct a binned estimate of V7 Q by binning 7„ values on regular grid and 
then discrete differencing. Hence we can obtain binned estimates for Vk, via equation (|3|). 
The finite field methods have attempted to determine k by averaging over line integrals. 
An alternative method to determine k is obtained by using Fourier transforms. The clear 
advantage here is that the gradient operator in real-space becomes algebraic in k-space. An 
added bonus is that fast Fourier transforms are computationally cheap to perform so the 
inversion is easy to do numerically. We display such an implementation scheme here. 

Consider first, for clarity, a 1-dimensional process, and imagine we have some discrete 
process f[n] (this might be a discretely sampled continuous field: f[n] = f(nAx)) and make 
the discrete difference 

V[n] = f[n + l]-f[n] (24) 
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Now we can construct the Fourier transforms (FT) of /, r), for any samples of finite length 



we recover the former? 

Now for continuous fields f(x), r](x) = df/dx, the derivative operator in Fourier space 
is just multiplication by ik, so we recover the FT of / by dividing FT of 77 by ik. This is 
not defined for k — 0, so this tells us we can not recover the DC component - again, this is 
not problematic as the DC level can not be determined from shear measurements alone in 
any case. 

It is well known that with for discrete FTs the discrete difference operator is 
multiplication by e~ 2mk ' N — 1 rather than ik, but there is a little more to it than that 
because of the periodic boundary conditions: if one simply multiplies f[k] by e~ 2mk l N — \ 
then one obtains the FT of the derivative of f[k] but taken with periodic boundary 
conditions which differs from the real discrete derivative of f[k] at the end points. 

The actual relation between f[k] and fj[k] is easily obtained: 




n=l 



The question is, how are f[k] and rj[k] related? In particular, if we are given the latter, can 



TV N 



fj[k] 



v[n}e 2nink/N = J2(f[n + 1] - f[n])e 2mnk/N 



n=l n=l 
N+1 N 




2iri(n-l)k/N _ r[ l 2-nink/N 




n=2 n=l 

N 



f[N + 1] - f[l] + ( e - 2 * ik ' N - 1) E f[n}e 2nink/N 




Y.vin] + (e- 2 ^ N -l)f k 



r][0] + ( e ~ 2mk/N - l)f[k] 



so the solution of our inversion problem is 



/[*] 



e -2wik/N _ ]_ 



(26) 



As before, this is not defined for k — 0, but also differs from usual formula 
fj[h\l{e-* Kik l N - 1) for k ^ 0. One way to implement this is simply to modify r)[N], the last 
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element of the rj[n] array, to be minus the sum of the other elements. That is, we construct 



N 



rj'[n] = 77 [n] - 5 nN ^ rj[n'] = 77 [n] - 5 nN fj[0] (27) 



n'=l 



which then has discrete transform fj'[k] = fj[k] — fj[0]. 

The above analysis can be extended to the case of 2 dimensions, as is applicable for 
our problem. We can estimate directly from the data the two partial derivatives K tX and 
K :V and, in principle, we can use either to estimate k. Fourier transforming, we obtain two 
measures of k: 



K a[t) k 



Hi 



k>,x \kxi ky\ 

,2mkx/N x _ I 

Kb[k x ,ky\ = ~ iky/Ny _ (28) 



e 



As with the KS93 method expressed in Fourier space these two estimators have 
noise which varies strongly with direction in fc-space, so we combine these with weights 
w a = kl/k 2 , Wb = kl/k 2 , which, for high frequencies at least, gives the optimal combination: 

k 2 k 2 
K,[k x , k y ] = —K a [k x , k y ] + — Ki>[k x , k y ] (29) 



It is very simple to implement this technique. One simply bins the shear estimates 
onto a grid (this can be very fine as the FFT is very fast, and the fact that most of the 
grid cells are actually empty causes no problem). We then apply the two discrete difference 
operators to this grid of shear values and then combine the gradients according to equation 
(0) which, for a very fine grid, produces an image (or rather pair of images; one for each 
component of the gradient) which is largely zero and has a little L-shaped pattern at the 
location of each galaxy and we then simply apply the FFT, calculate R according to the 
two equations directly above, and then inverse transform. The results from this are very 
hard to distinguish visually from e.g. our other best methods, but, as we shall see, the noise 
power analysis suggests that the low-frequency performance of this method is somewhat 
worse than the best methods. 

Doing the inversion in Fourier space also leads to the attractive possibility to perform 
the non-linear lensing inversion. It is possible to construct the quantity Vlog(l — k) from 
observable quantities (Kaiser 1995), in exact analogy with equation (|3[). Thus simply 
replacing k by log(l — k) in the equations above gives a prescription for doing the inversion, 
even in the non-linear regime. 
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4. Regularized Inversion Methods 

From the finite field 'direct reconstruction' methods, we obtain the k estimate as a 
simple convolution of the data with some geometric kernels. This is guaranteed to be 
numerically stable and unbiased and has well defined and calculable noise properties. 
These are all nice features but the methods explored all presuppose that we have uniform 
sampling of the data. If one has highly inhomogeneous sampling (and perhaps large holes 
in the data), one could imagine modifying the estimators derived here by bending the paths 
to avoid holes or noisy regions, but it it not clear that this is the best thing to do. A 
closely related issue is that we have concentrated on using the shear data on some area to 
determine k on the same region, yet (as our aperture mass statistics clearly show) the shear 
is also rich in information on k outside the data region. 

A quite different approach that can address the above problems is to attempt to 
find the most probable solution for the underlying surface density, given the set of 
observed shear estimates. To determine this, we construct the log-likelihood of a model 
C(k) = log p(data| model). An apparent problem is that 7 is determined in part by k values 
outside the grid. One solution would be to extend the model to include more grid points, 
but this is somewhat costly. Here instead we exploit the local expression for Vk in terms of 
V7 Q in equation (|3[) 

For a k field which is smooth on the scale of the grid, a grid of estimates of Vk 
(obtained from the measured shear values by applying an appropriate discrete difference 
operator form of the rhs of equation (§)) should be equal to the same operator applied to 
the model K-grid with gaussian distributed, but non-trivially correlated, gaussian residuals. 
What we are doing is using the discrete difference form of equation (|3]) to project from 
the data that part which is locally determined, and then solving for the most likely 
^-configuration assuming we were provided only with this reduced from of the data. We 
present two methods to do this. In §fOJ we calculate the most probable k field assuming 
a gaussian prior. This is similar to the method applied by Kaiser and Stebbins (1991) 
to reconstruct the local density field form peculiar velocity data. We also reformulate in 



real-space the problem in terms of a standard maximum likelihood analysis in §4.2 



4.1. Maximum Probability Method 

We approach the problem from the following perspective: given a set of observations 
(e.g. shear estimates for N ga i galaxies), we want to fit a model for the underlying surface 
density. To be physically useful, the model must be as unrestrictive and general as possible 
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We take 

n(r) = a k cos(k • r) + 6 k sin(k • r) (30) 

where we assume the surface density is periodic inside some box of side L - this is 
nonrestrictive as we can always zero-pad our data region to impose periodic boundary 
conditions. Allowing m Fourier modes in our model, we rewrite equation (RUT) as 



2m 



(31) 



We have complete freedom in determining to Fourier coefficients Cj. A standard approach 
is to find the most probable set Cj given the observed data. This implies we minimize 

Ngal i r 2m 



X' 



07 



^ _2 Tai 5Z c jYj{ r i)Xaj 



with respect to so that 







1=1 U l 



2m 



To 



^ C jYj( T i)Xaj 



Y k {ri)x a k 



(32) 



(33) 



for a = 0, 1 which labels the two shear components. This becomes a matrix equation 
(A T A)c = A T d where = Yj(rj)xaji /<J i is an ^gai x 2m matrix and di = ^ai/oi is a vector 
of length 2m. To solve for the maximum likelihood c, we need to invert a 2m x 2m matrix, 
which, in general, will be singular for any reasonably large number of k-modes. 

We regularize this inversion with a solution in the style of maximum entropy 
reconstructions. The probability of the model c given the observed data T> and some prior 
information / is 

P(c\I) 



P(c\VI) = P{V\cI)- 



P{V\I) 

and we want to maximize the probability to determine the "best fit" model for c. This 
analysis yields the same equations as before but we including some prior knowledge 
to constrain, and hence regularize, the solutions. For Gaussian uncertainties of the 
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measurements, the first term on the rhs of equation (34) is as before 

1 o. 



P(V\cI) oc exp( 



;x 



(35) 



We take as our prior model / the assumption that the Fourier coefficients are drawn 
from a Gaussian distribution with a power spectrum (a^) = (bl) = P k n . This adds a 
diagonal strip to the matrix equation so that 



N, 



(A T A)j 



Xj( r i)Xk( r i 



+ 



S 3h 



P(k) 



(36) 
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and hence the matrix inversion is non-singular for an arbitrary number of Fourier modes. 

This is very similar in spirit to the maximum probability reconstructions of the density 
field from peculiar velocities employed by Kaiser & Stebbins (1991). We adopt a white 
noise (n = 0) power spectrum - this is often viewed as a maximally non-committal choice - 
and allow the amplitude to be a varying parameter. Setting large amplitude for the power 
(i.e. large Pq) corresponds to little regularization, while a small power amplitude tends to 
heavily bias the recovered signal downwards. The optimal amount to use is determined 
empirically. We run extensive simulations for a given N ga i to test the sensitivity of the 
reconstruction to the amplitude of the power selected. We find that there is a plateau where 
the reconstruction is insensitive to the input power (the reconstruction is stable to variations 
about this amplitude). This is a very nice feature as we can tune the regularization so that 
the inversion is well behaved, yet we are not forcing the solutions to some particular form 
by the choice of our prior. 



4.2. Maximum Likelihood Analysis 

We now construct an algorithm in real-space to calculate the maximum-likelihood 
K-field configuration (considered as a grid of N x x N y values) given a set of shear 
estimates binned on a similar grid. We present the simple problem where we are given 
a binned set of shear values with equal noise. The generalization to unequal noise is 
straightforward and is not presented here. As before, we need to construct the log-likelihood 

— » 

C(k) = log p(data| model), exploiting the local relationship between Vk and observable 
shear estimates. 

We write the shear- vector estimates as the 2 x N matrix ^ ar where a — 0, 1 labels 
the shear component and r = N x y + x which ranges from to iV — 1 is a compound index 
labeling the position where N = N x N y and where the individual coordinate indices are 
x = 0, N x — 1 and y = 0,N y — l. Similarly we denote the model grid of surface density values 
by the vector k t . However, since the discrete differencing will only determine k modulo an 
additive constant we make K r of dimension N — 1. This effectively forces the corner element 
k(x — N x — 1, y — N y — 1) to be zero, though one is free to adjust the zero-point after the 
reconstruction to any chosen reference value. 

There are several possible ways to implement the discrete differencing operator. We 
will use 

df/dx -> for = Dorr'fr /^x 

df/dy -> fir = D lrr ,f r 

where the derivative fields f ir live on an interleaved M = M x x M y grid with Mj = N t — 1, 
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and where 



D() rr ' — ((5 X '_ x — 1,3/'— y— 1 S x ' —x,y' —y—l) "i" \$x' —x—l,y' —y ^x'—x,y'—y))/^ 
Dlrr' ((^st'— a;— y— 1 $x'— x— l,y'— y) [&x'—x,y'—y—l ^x'—x,y'—y)}/^ 



(38) 



The discrete difference form of the rhs of equation (|3|) is 



&0r — -Dorr'TOr' + -Dlrr'7lr' 
&lr = -Dorr'7lr' ~ Alrr'TOr' 



(39) 



Or Kj r = Qirjr'ljr' With 
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(40) 



while the lhs is K ir = D irr /K,' r , with the summation extending from r' = to A" — 2. 
The correlation matrix for the residuals is defined as 
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and is easily calculated since in the absence of any distorting influence (and assuming 
Gaussian distributed independent errors on the shear- vector estimates) (7j r 7 mr ') = a^Si m S r 
so 



Ci r jr' — Qirlr"Qjr'mr'"(llr"1mr'") — 8ijQorlr"Qor'lr" 

and so the log-likelihood is 



C(Kt) — —-C rr }(Qirl r ii-fl r " — Di rr nK r ii)(Q 



/r ' mr"'^1mr"' D^ r f r 'ff K r 



(42) 



(43) 



where we have defined the M x M matrix C rr > = Qo r ir"Q 



Or'lr"- 



Minimization of £ wrt n r results in A rr iK r > = B r where A rr > = D ir ii r Ui r n r i and 
B r = Vrir'lir' where we have defined Ui rr ' = C~ r },D ir ii T i and V r i r ' = Qir"ir'Ui r " r 

Unfortunately, the matrix A is singular; it has a null space consisting of quasi-oscillatory 
vectors, any combination of which dotted with A on left gives zero on the rhs. Adding a 
9-point Laplacian Tickhonov-Miller regularization (see Num Recipes, 2nd ed; chapter on 
inverse methods) seems to solve this nicely. We now maximize £ — aL rr 'K r 'L rr "K r /> /2 i.e. we 
add to the 'chi-squared' an extra term consisting of some multiple of the variance in the 
2nd derivative field L rr iK r >, and we now obtain the matrix equation 



(A + aL T L)k = B 



(44) 
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where a is the "trade-off parameter" and 



1 1 



L rr , = 95(r -r')- £ 6 ( r + N x y + x - r') (45) 

?/=— 1 x=— 1 

is the Laplacian smoothing operator (we define L rr / for all grid points r, but for for points 



on the perimeter, where the sum in equation (45) would extend outside the grid, we set 
L rr i = 0) . 

The effect of the regularization is to suppress the spurious 'null-modes' which have 
strong high frequency components. The trade-off parameter, a, can be thought of as a dial 
that either makes the solution very smooth (for large a), or allows the solution to fit all 
of the data points (for small a). For practical implementations, one wants to choose some 
intermediate regularization that fits the data acceptably well, while smoothing out some of 
the noise fluctuations and conditioning the inversion process. In simulations, we find that 
for a >~ 1 the result is a heavily smoothed reconstruction. For small a, the inversion is 
somewhat insensitive to the precise a value (unless it is so small that the matrix inversions 
blow up). This is an important feature - the answers given by this method will not be 
prejudiced by any reasonable choice for the regularization, and the user has some freedom 
in setting this parameter. 



5. Noise Properties 

So far we have constructed a whole family of estimators which all, given perfect data, 
return an exact reconstruction of the density field. The finite-field kernels each measure k 
relative to a different baseline, but are all really equivalent since we can always add any 
constant we like to force a particular normalization after the fact. For example, you might 
particularly want to measure the surface density in some region relative to its value in the 
reference strip shown in Figure 0c. You could of course use the case-c estimator without 
further ado. You could equally well use case-b, measure the value of the reconstruction 
in your chosen strip and then readjust the baseline. If this gives an estimate with lower 
statistical uncertainty then you should do so. 

Our estimators are weighted sums of observed background ellipticities; we think of each 
galaxy giving a noisy estimate of the shear at its location, the noise arising in part from 
the random intrinsic ellipticity and in part from the measurement error. The statistical 
uncertainty depends on how the weight is distributed among the background galaxies; 
generally speaking, the more uniform the weight the smaller the statistical uncertainty. 
Clearly the kernels are always rather sharply peaked towards the target point. This 
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generates a 1/r 2 divergence in the kernels. This is a common feature for all the methods 
and is also largely irrelevant since, as discussed, it generates predominantly high frequency 
noise which goes away when we smooth. Here we are more interested in the low spatial 
frequency behavior of the reconstruction since this is where the bias appears in equation (|T|). 
For random background ellipticities the reconstruction will be a superposition of 'inverse 
kernel' patterns. The low-frequency behavior of these patterns is somewhat different for 
the different methods. In case-c type methods, for a narrow reference strip, the galaxies in 
the reference region receive rather large weight, becoming infinite in the limit as the width 
tends to zero. Moreover, the value of the kernel in the annulus depends only weakly on the 
target position provided the target point is not close to the edge. Thus we would expect 
that for a random realization of reference annulus galaxy ellipticities one will generate a 
noise fluctuation of random amplitude, but which will be rather coherent and flat in the 
center of the reconstruction. 



5.1. Simulations 

As a first, qualitative test we can compare reconstructions made with simulated data. 
To make these simulations we generate random Gaussian variates 71, 72 for randomly 
scattered galaxies. The rms 1-component shear error is yj (7^) ~ 0.2. This seems to be 
quite a good value, determined observationally from field galaxies. We then add a small 
systematic shift calculated for a particular lens model. We place the cluster at z = 0.2 and 
the galaxies in a plane at z = 0.9. We display the reconstructed surface density using the 
various methods in Figure |5|. All of the reconstructions have been smoothed with a 2 pixel 
Gaussian. The algorithm of Seitz & Schneider was employed using circular geometry for 
calculational ease. 

Surprisingly, the bias in the original KS estimator is not very strong - even though the 
input mass distribution is not axisymmetric- and the bias seems only appreciable in the 
very corners of the reconstruction. 

The results are all somewhat similar in the signal recovery. Of course, the overall 
amplitude of the case-c type estimators is higher than for case-b but this is irrelevant as 
the baseline surface density is unmeasurable from the shear measurements alone. However, 
the the noise properties vary significantly among the methods. With the case-c estimator, 
the presence of the boundary term in the reconstruction gives rise to much higher noise, 
especially near the edge of the reconstruction. The same is true for the Fourier method. 
Qualitatively, one favors the methods that have the smallest noise fluctuations and the 
case-c estimators seem to perform the worst in these simulations. 
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While these simulations are very simplistic, they are indicative of the type of behavior 
we expect. The various estimation techniques are all very similar in the signal they recover. 
However, the case-c type estimators - estimators with boundary terms - tend to have 
the most obvious worrisome noise properties. In the next section, we explore this more 
quantitatively by calculating the statistical properties of the noise for the various types of 
estimators. 



5.2. Noise Power Analysis 

For sufficiently high galaxy number density we expect that the noise will approach a 
Gaussian process by virtue of the central limit theorem. In that limit everything we need to 
know about the noise field is encoded in its power spectrum or autocorrelation function. As 
discussed, we are primarily interested in the performance of these methods for recovering 
low frequency signals. We do this in two ways: the first is to calculate the noise-power 
for the various methods, and the second is to calculate the variance in a specific 'aperture 
mass' statistic. 

To estimate the noise power we have generated a large number of realizations for 
randomly oriented galaxies. For each of these we estimate the power and then average the 
power over the realizations. In doing this we came upon a slight technical problem; all of 
the finite-field methods tend to produce 'hot-spots' around galaxies close to the edge of the 
survey region. These tend to be worse for the case-c type estimator, but are present in 
all cases. If we naively measure the power over the whole survey area we find that these 
leak though into the low frequency modes (this is partly due to lack of resolution but is 
also partly a real, resolution independent, effect), artificially enhancing the low frequency 
power. This is somewhat discouraging, since the motivation for developing these methods 
was to cure the bias in the KS93 reconstruction, and we seem to have exchanged this for a 
lack of precision. 

To make a meaningful comparison we have cropped each image to remove a strip 16 
pixels wide around the edge. The low-frequency noise power is shown in Figure |6|. For each 
finite-field method we have chosen two different values (0.02 and 0.04) for the softening 
parameter e. As expected the low-frequency power is very insensitive to e (provided it is 
small at least). We have also calculated the noise power using the maximum probability 
method with k max = 10, and two amplitudes of the power amplitude that vary by a factor 
of five. We also display the results using the maximum likelihood method using trade-off 
parameters a = 0.001 and a = 0.005. 
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Fig. 5. — Simulation of cluster mass reconstruction. We have generated data by laying 
down randomly placed galaxies and assigning them ellipticities with a random component 
to represent the intrinsic ellipticity plus a systematic component for a model lens. The 
reconstruction is shown for 2500 galaxies at a = 0.9. The lens is a bi-modal softened 
isothermal like model at z = 0.2. The main difference among the methods is the noise 
properties induced by giving large weight to galaxies near the frame edge. 
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Our calculation of the noise power spectrum differs from that of Seitz & Schneider. 
In Figure 3 of their paper they make a similar attempt to calculate the noise power 
spectrum. Their results are puzzling however. For the original KS93 estimator, we expect 
P(k) = constant for a pure noise field. Seitz & Schneider plot k 2 P(k), which, judging 
from the points in their Figure, varies as ~ k 1 for long-wavelengths. Indeed all of the 
power spectra they calculate seem to diverge for long wavelengths. One explanation of this 
may be the lack of resolution in their simulations. As we discussed before, the finite-field 
kernels have a 1/r 2 divergence and a softening of this divergence is required. With coarse 
resolution, the cos (20) dependence in the kernel is not sampled finely enough and there is a 
leakage into the low frequency power. We find that for all of the methods, it is necessary to 
use much finer resolution to compensate for this effect. Indeed, when we employ the same 
resolution they used, we reproduce their curves. With finer resolution however, we obtain 
somewhat different results. 

The main difference among the noise power in all of the methods is that the estimators 
with boundary terms have a much noisier zero-frequency component. This is present in the 
case-c estimators and the Seitz & Schneider algorithm. As discussed, this is because the 
small number of galaxies in the annulus give a random fluctuation in the reconstruction 
which is nearly independent of target position. This is rather interesting. One motivation 
for choosing a case-c type estimator is that for a monotonic cluster profile one would 
expect this estimator to give a higher signal. What we are finding is that the price for 
this is increased noise, so this is largely counterproductive. Similarly, the presence of the 
boundary term in the Seitz & Schneider algorithm introduces a large fluctuation in the long 
wavelength noise power. 

The maximum likelihood direct reconstruction method has behavior similar to the 
other methods at high frequencies. The obvious difference in this method is the presence of 
an enormous and very rapid divergence at small wavenumber. This suggests that for a pure 
noise reconstruction, the most likely solutions determined for the inversion are models with 
a large low frequency components. While a model with, for example, a gradient across the 
field seems very unlikely, this seems to be telling us that it is more likely than any other. 
This reflects the role of the trade-off parameter in our regularization - we can adjust it so 
that the reconstruction is either very smooth (large regularization) or, at the other extreme, 
returns a model that fits each and every data point (and hence has large variations). What 
we find is that for a broad choice of trade-off parameters, the result is very stable and 
relaxes to a model with long wavelength fluctuations. Dialing down the trade-off parameter 
to attempt to reduce the smoothness imposed on the inversion yields no improvement - the 
inversion becomes ill-conditioned. 
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The Fourier method has, not surprisingly, a long wavelength fluctuation as well. This 
simply comes from the fact that we are altering the data to overcome the periodic boundary 
condition problem. We have experimented with cropping the reconstruction area to attempt 
to remove this boundary effect and indeed find some improvement. However, there is some 
leakage into the long wavelength modes for any reasonable trimming. 

The behavior of the maximum probability method looks most promising. We see 
that for reasonable regularization, we obtain a flat noise power spectrum with the lowest 
amplitude of all the methods considered here. It is important to get the regularization 
correct as we see from the top curve in the maximum probability noise power spectrum 
- with too little amplitude in the prior power, the low frequency modes are amplified. 
However, for any given data set with some specified geometry and number of galaxies, we 
can determine the required regularization empirically so that the inversion is well behaved 
with no bias in the recovered signal. 

6. Aperture Mass Measurement 

Two-dimensional images of the mass distribution are nice, but sometimes one would 
like to estimate some gross property such as the mass contained within a given aperture. 
Now one could always take the reconstruction, and integrate over the area in question, 
but then calculating the statistical uncertainty would seem to be a complicated business, 
particularly if the reconstruction is smoothed. In fact one can construct statistics which 
measure the mass within an aperture with a single summation over the galaxies. This is 
simpler, and has correspondingly simple statistical properties. Moreover, these statistics 
can be designed to use only the data outside the aperture, which is useful if one wishes to 
minimize the contamination of the background galaxy population by faint cluster members, 
or to avoid regions where the weak shear approximation may not be valid. 

As discussed in the introduction, from equation the integral in equation (jl]) 
truncated at inner and outer radii r\, r 2 , is equal to 7c(rj) — 7t(r 2 ); if we make r 2 large 
this will tend to ft(ri), and in general gives a lower bound on 7c(ri). This statistic is 
quite different in nature from equation (|3|) which shows how observations on some region 
determine the surface density on the same region (modulo a constant). Here the statistic 
measures, or places a bound on, the mass in some region with observations taken outside 
that region. 

As we show in appendix [A], one can derive equation (|4|) from Gauss' law. This gives 
the mass within an aperture as the integral of the normal component of V0 around the 
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Fig. 6. — The low frequency noise power spectrum for the different estimators. The plots 
were made by placing galaxies at random in the field with an rms polarization of 0.2 per 
component. The calculations for each method employed some small scale softening to prevent 
overflow in the computer. For example, the case b and c kernels were softened with e = 0.02 
(solid-line) and, 0.04 (dashed line) but the long wavelength power is unaffected by this 
smoothing. Methods with boundary terms in the convolution (case-c, SS) tend to have 
larger noise in the zero frequency component. 
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aperture boundary, while the shear measures second derivatives of 0, so it is not surprising 
that if we differentiate the mass enclosed with respect to the aperture radius this picks out 
some average of the shear on the boundary. Here we will first derive equation (f|) from 



equation (II) and then generalize it to non-circular apertures. 



If we take a circular survey of radius p and place the origin at the center then d = p 
and 7* 7a = 7t, so 7* = 7^ and equation ([[4]) becomes 





so for concentric disks we find 

P-2 



1 P 

k{6) — k(j>) = — J dr/r J dip^T (46) 



k( Pi ) - n(p 2 ) = 2 J d\nr(j T ) (47) 
pi 

where (...)= / d(p/2it . . . and which gives a rigorous lower bound on the mass interior to 
r 1; or, in differential form, 

which is equivalent to equation (|). 

Now k can be thought of as a smoothing of k with a top-hat window function, and 
k(pi) — k(P2) is a compensated top hat filter. A general circularly symmetric smoothing of 
k can be expressed as 

J d 2 r W{r)n = J d 2 r W'(r)-y T (49) 

with 

W'{r) = — J dr' r'W{r') - W(r) (50) 
r 

To verify this result, integrate the lhs of equation |49| by parts to obtain an integral involving 
k and then integrate by parts once again and replace derivatives of k by 7^ using equation 



Equation (|50| ) provides a simple way to construct the KS93-style estimator for an 
arbitrarily smoothed reconstruction. Note that even if W(r) has compact support W will 
still have an extended tail oc 1/r 2 unless / dr rW(r) vanishes — i.e. unless we have a 
compensated filter. 



The argument leading to equation (|47|) also works for non-circular apertures. The 
key point is that the form of the kernel in equation ( |I4]) depends only on the shape of 
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the boundary, so if we consider two nested boundaries of the same shape and orientation 
and take the difference in their k(0) — 7c values there is exact cancellation within the inner 
boundary curve and the result will only depend on the data between the curves. If we let 
the outer boundary be p(ip) and the inner one be p(<p)' = ap(<p) with scale factor a < 1, 
then 

r<p 

k(o}-k(1) = -J d 2 n*p 2 /r 2 (51) 

r>ap 

where A is the area inside the boundary p(<p), or in differential form 

d~K 



dlnA 



<7*> (52) 



where the averaging is understood to be over the thin ribbon separating two neighboring 
self-similar curves curves. 

It is straightforward to generalize the aperture mass statistic to inner and outer curves 
of different shapes; simply choose some point within the inner aperture as origin, evaluate 
equation QI^D for the two boundaries and take the difference, but the resulting statistic will 
now depend on the data within the inner aperture. 



7. Discussion 

We have explored the problem of reconstructing the surface density by direct and 
regularized inversion methods. The direct methods employ averaging over line integrals 
of the shear gradients exploiting equation @. While we have restricted attention to 
straight lines, there is still a wide range choice in the geometry of the reference region. 
By performing the integrations by parts analytically we have been able to show explicitly 
how the estimator can be constructed as a weighted sum of galaxy ellipticities, greatly 
simplifying the estimation of the surface density and its uncertainty. We have presented 
analytic forms for the kernel for simple rectangular and circular survey geometries. 

Our analysis has revealed a fundamental problem with attempts to measure k relative 
to its value on the boundary of the data region (as in the first method developed by 
Schneider). We find that this is simply not attainable without introducing very large 
statistical uncertainty and/or bias. We have shown however that one can construct an 
estimator of k relative to its value in some finite strip around the boundary, but the 
performance of this estimator seems to be no better than a simpler estimator where the 
baseline is set to the average over the whole region (our case-b estimator). We have also 
shown how this particular estimator arises naturally if one develops (starting from Gauss' 
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law) a differential relation between the mass enclosed within a boundary and the shear on 
the boundary. We have studied a new estimation technique proposed by Seitz & Schneider 
and have shown that it too possesses a boundary term and suffers from the same limitation 
as the method discussed above. 

With fine grid resolution, we calculated the noise power spectrum for all the of the 
reconstruction methods. We find that, in general, estimators such as the case-c type or 
the Seitz & Schneider algorithm which give large weight to galaxies near the boundary 
have large long wavelength components in the noise power. Among the methods that are 
classified as direct reconstruction, the case-b type estimators have more desirable noise 
properties than any estimator containing the boundary term. 

Using a quite different approach, we have shown how one can construct an exact 
inverse gradient operator in discrete Fourier transform space. This method has some 
distinct advantages: it is extremely fast; it can be applied with a very fine grid to avoid 
losing any spatial resolution (the real resolution limit is set by the noisy nature of the shear 
estimates); and it is very easily extended to the strong-lensing regime. Unfortunately, the 
low-frequency noise power seems to be somewhat higher than for the regularized maximum 
likelihood method for instance. 

We also formulated two regularized inversion methods based on the maximum 
probability and maximum likelihood techniques. As with the direct reconstruction methods, 
we discriminate among the performance of these estimators by their noise properties. We 
find that the maximum likelihood estimator with the usual 9-pt Laplacian regularizer 
contains very large low frequency noise components. The maximum probability method, 
however, seems to be the most promising. With modest regularization, the noise power 
spectrum is flat at all wavelengths, with essentially no bias in the recovered signal. Were 
we given some data and told we could only apply one of the methods described here then 
that is what we would choose. The method is somewhat more costly in computer time than 
e.g. KS93 or the Fourier inverse gradient operator, but not unreasonably so for realistic 
numbers of galaxies. 

We are confident that this paper is not the last word that will be said on the subject. 
The area promises to be rich, simply because there are potentially infinitely many ways 
to recover the surface density from the shear. Crudely speaking this is because we are 
provided with two real scalar fields 71 and 72 which is much more information than we need 
to recover the single scalar k, so there are many ways one can squander this information 
and yet still recover an unbiased answer. All we have done here is to apply a geometric 
construction to generate a family of viable estimators, and we have then compared their 
statistical performance. We have no way of knowing whether there might not be much 
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better estimators out there somewhere, but perhaps the similarity in noise power for the 
methods we have considered is telling us something. 

We would like to thank P. Schneider, C. Seitz and S. Seitz for enlightening and helpful 
conversations. 



A. Aperture Masses from Gauss' Law 

Consider a circular loop of radius r and with azimuthal coordinate (p. At each point 
on the circle construct locally cartesian coordinates n, t along the outward normal and 
tangential directions n, t. By Gauss' law, the mass contained within the loop is 

M(r) = J" d 2 r k(t) = d( P4>,n (Al) 

so 

dM M r r , , /A . 

+ -<b d(fxj> tnn (A2) 



dr r 2 

but (j) tnn = k — 7t where the tangential shear is jt = (4>,tt — (fi,nn)/2. In components in the 
general coordinate frame 7r = —(71 cos 2ip + 72 sin 2ip). Hence 

. . -1 .dM 2M. -1 <$K .. , 

(7r) = 2^ ( ^7 - V } = TdhTr (A3) 

where (•jt) = / rf^7r/27r and k = / d 2 rn/ J d 2 r. This is equivalent to equation (|). 

Let us now generalize this to non-circular apertures. Take some arbitrary point as 
origin and consider the self-similar family of closed curves r(a, A) = ac(A) where A is a 
cyclic parameter around the curve and a is a scale factor. If we consider two neighboring 
curves then vector connecting points with the same A has length 5r = c(X)8a, and if we 
erect orthogonal locally normal and tangential coordinates (n, I), such pairs have separation 
8n = cSa sin if), 51 = 5n cot if> where cosip = c-cf/cc' and where <? = dc/dX and d = |<?|. By 
Gauss' law, 

M(a) = ^fdXc'<f ) , n (r) (A4) 



and 



Af(a') = ^j> rfAc'0 in (f + 5nn + 511) (A5) 



where n, I are unit vectors. Taylor expanding n to 1st order and replacing i7m by 
n — 7+ with parallel component of the shear 7+ defined to be (cf) t u — jnn )/2, and with the 
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orthogonal component of the shear 7 X = —<p >n i, we have 

8M = M(a)5a/a + d\ac'5n(K - 7+ - 7 X cot if?) (A6) 

but the integral here is just the integral of k — 7 + — 7 X cot ip over the area S A in the ribbon 
between the two curves, and we therefore have, for 5K = 5(M/A), 

$k — —7- [ d 2 r( 1+ + 7x cot V) (A7) 

A J 8 A 

or _ 

= ( 7+ + 7xCotV) (A8) 

where the averaging is understood to be over the thin ribbon. 

Now the area is A = A (r 2 /c 2 ) where A is the area enclosed within the curve r = c, so 
we have 

K( a ) 11 

A t 

r>ac 

If we take c to be the perimeter of the data region and let a — > then we obtain a measure 
of k at a point (the origin) relative to the mean over the data region and the result is 
precisely equivalent to equation For finite a we obtain the mass enclosed within the 



7t(l) = — / rf 2 r( 7+ + 7x cot ^)c 2 /r 2 (A9) 
^0 J 



aperture ac(A) as an integral over the data lying outside the aperture just as in §|6|. 

The alternative derivation given in this section is not entirely pointless, however, since 
because of the redundancy in the data — we appear to have a two component field 7 Q (r) 
but in reality both 71 and 72 are derived from a single scalar surface potential function 
<f)(r) — there are many different estimators which measure the same physical quantity 
but which have different noise characteristics. A simple example of this is equation (|4]) to 
which we could, if we were perverse, add any multiple of the integral of 7 X around the loop 
which physically must vanish and so, with real data, will add pure noise. It was therefore 
not entirely a foregone conclusion that these two different approaches would give the same 
result, but in fact they do. 
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